AITopics | beta policy

Collaborating Authors

beta policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Soft Actor-Critic with Beta Policy via Implicit Reparameterization Gradients

Della Libera, Luca

arXiv.org Artificial IntelligenceSep-8-2024

Recent advances in deep reinforcement learning have achieved impressive results in a wide range of complex tasks, but poor sample efficiency remains a major obstacle to real-world deployment. Soft actor-critic (SAC) mitigates this problem by combining stochastic policy optimization and off-policy learning, but its applicability is restricted to distributions whose gradients can be computed through the reparameterization trick. This limitation excludes several important examples such as the beta distribution, which was shown to improve the convergence rate of actor-critic algorithms in high-dimensional continuous control problems thanks to its bounded support. To address this issue, we investigate the use of implicit reparameterization, a powerful technique that extends the class of reparameterizable distributions. In particular, we use implicit reparameterization gradients to train SAC with the beta policy on simulated robot locomotion environments and compare its performance with common baselines. Experimental results show that the beta policy is a viable alternative, as it outperforms the normal policy and is on par with the squashed normal policy, which is the go-to choice for SAC. The code is available at https://github.com/lucadellalib/sac-beta.

beta policy, implicit reparameterization gradient, soft actor-critic

arXiv.org Artificial Intelligence

2409.04971

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems

Suttle, Wesley A., Sharma, Vipul K., Kosaraju, Krishna C., Sivaranjani, S., Liu, Ji, Gupta, Vijay, Sadler, Brian M.

arXiv.org Artificial IntelligenceMar-6-2024

We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to learn a potentially unsafe controller, whose actions are projected onto safe sets prescribed, for example, by a control barrier function. Though safe, such approaches lose any convergence guarantees enjoyed by the underlying RL methods. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees while satisfying hard safety constraints throughout training and deployment. We validate the efficacy of our approach in simulation, including safe control of a quadcopter in a challenging obstacle avoidance problem, and demonstrate that it outperforms existing benchmarks.

barrier function, experiment, sampling-based safe reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2403.04007

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.82)

Industry: Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Market Making with Scaled Beta Policies

Jerome, Joseph, Palmer, Gregory, Savani, Rahul

arXiv.org Artificial IntelligenceSep-27-2022

This paper introduces a new representation for the actions of a market maker in an order-driven market. This representation uses scaled beta distributions, and generalises three approaches taken in the artificial intelligence for market making literature: single price-level selection, ladder strategies and "market making at the touch". Ladder strategies place uniform volume across an interval of contiguous prices. Scaled beta distribution based policies generalise these, allowing volume to be skewed across the price interval. We demonstrate that this flexibility is useful for inventory management, one of the key challenges faced by a market maker. In this paper, we conduct three main experiments: first, we compare our more flexible beta-based actions with the special case of ladder strategies; then, we investigate the performance of simple fixed distributions; and finally, we devise and evaluate a simple and intuitive dynamic control policy that adjusts actions in a continuous manner depending on the signed inventory that the market maker has acquired. All empirical evaluations use a high-fidelity limit order book simulator based on historical data with 50 levels on each side.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2207.03352

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
North America > United States > Texas (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Proximal Policy Optimization with Continuous Bounded Action Space via the Beta Distribution

Petrazzini, Irving G. B., Antonelo, Eric A.

arXiv.org Artificial IntelligenceNov-3-2021

Reinforcement learning methods for continuous control tasks have evolved in recent years generating a family of policy gradient methods that rely primarily on a Gaussian distribution for modeling a stochastic policy. However, the Gaussian distribution has an infinite support, whereas real world applications usually have a bounded action space. This dissonance causes an estimation bias that can be eliminated if the Beta distribution is used for the policy instead, as it presents a finite support. In this work, we investigate how this Beta policy performs when it is trained by the Proximal Policy Optimization (PPO) algorithm on two continuous control tasks from OpenAI gym. For both tasks, the Beta policy is superior to the Gaussian policy in terms of agent's final expected reward, also showing more stability and faster convergence of the training process. For the CarRacing environment with high-dimensional image input, the agent's success rate was improved by 63% over the Gaussian policy.

action space, agent, beta distribution, (14 more...)

arXiv.org Artificial Intelligence

2111.02202

Country:

South America > Brazil > Santa Catarina > Florianópolis (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Revisiting Design Choices in Proximal Policy Optimization

Hsu, Chloe Ching-Yun, Mendler-Dünner, Celestine, Hardt, Moritz

arXiv.org Machine LearningSep-22-2020

Proximal Policy Optimization (PPO) is a popular deep policy gradient algorithm. In standard implementations, PPO regularizes policy updates with clipped probability ratios, and parameterizes policies with either continuous Gaussian distributions or discrete Softmax distributions. These design choices are widely accepted, and motivated by empirical performance comparisons on MuJoCo and Atari benchmarks. We revisit these practices outside the regime of current benchmarks, and expose three failure modes of standard PPO. We explain why standard design choices are problematic in these cases, and show that alternative choices of surrogate objectives and policy parameterizations can prevent the failure modes. We hope that our work serves as a reminder that many algorithmic design choices in reinforcement learning are tied to specific simulation environments. We should not implicitly accept these choices as a standard part of a more general algorithm.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Machine Learning

2009.10897

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Mexico > Gulf of Mexico (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback